Marks old storages as dirty in clean_accounts()#3702
Conversation
|
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
| let acceptable_straggler_slot_count = 100; | ||
| let old_slot_cutoff = | ||
| slot_one_epoch_old.saturating_sub(acceptable_straggler_slot_count); | ||
| let (old_storages, old_slots) = self.get_snapshot_storages(..old_slot_cutoff); |
There was a problem hiding this comment.
i was originally worried about the cost of getting these snapshots, but that worry was about 3 years ago. I assume this isn't too expensive to run every clean? clean is only once per 100s or so. I imagine this is fine. Not sure how else you'd get this. I was piggy backing off the storages collected for hash of all accounts previously.
There was a problem hiding this comment.
Yeah, it would be nice to not fetch the storages unnecessarily. I've been running this on a validator for a few hours now.
Yes, clean normally runs once every ~45-50 seconds (~100 slots). The median time to get the storages here has been 23 milliseconds, with a common range of 21-28 milliseconds.
Another option would be to inspect these old storages less often.
It is nice that once we skip rewrites this will all go away, and that does feel close now.
jeffwashington
left a comment
There was a problem hiding this comment.
lgtm. bummer we introduced this side effect. when we eliminate rewrites, all this will go away.
(cherry picked from commit bf33b8c) # Conflicts: # accounts-db/src/accounts_db.rs # accounts-db/src/accounts_db/tests.rs # runtime/src/bank.rs
(cherry picked from commit bf33b8c) # Conflicts: # accounts-db/src/accounts_db/tests.rs
Problem
We do not clean up old storages.
More context: when calculating a full accounts hash, we call
mark_old_slots_as_dirty()as a way to ensure we do not forget or miss cleaning up really old storages (i.e. ones that are older than an epoch old). But, when we enable skipping rewrites, we don't want to clean up those old storages, as they'll intentionally be treated as ancient append vecs. So insidemark_old_slots_as_dirty()we conditionally mark old slots as dirty. This is based on the value ofancient_append_vec_offset, which should be None unless ancient append vecs are enabled.Unfortunately, normal running validators, we end up never marking old slots as dirty, because the ancient append vec offset is always Some. And thus we don't clean up old storages.
Summary of Changes
Mark old storages as dirty in clean_accounts().
We still check if ancient append vecs are enabled, but not with the
ancient_append_vec_offset. Instead we look at the skipping rewrites feature gate and the cli arg.By moving this marking into clean_accounts(), we also decouple it from accounts hash calculation, which is not necessary anymore. This also removes behavioral differences based on if snapshots are enabled or not.